{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Interpretable Regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook we will fit regression explainable boosting machine (EBM), LinearRegression, and RegressionTree models. After fitting them, we will use their glassbox nature to understand their global and local explanations.\n",
    "\n",
    "This notebook can be found in our [**_examples folder_**](https://github.com/interpretml/interpret/tree/develop/docs/interpret/python/examples) on GitHub."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# install interpret if not already installed\n",
    "try:\n",
    "    import interpret\n",
    "except ModuleNotFoundError:\n",
    "    !pip install --quiet interpret scikit-learn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn.datasets import fetch_california_housing\n",
    "from sklearn.model_selection import train_test_split\n",
    "from interpret import show\n",
    "from interpret.perf import RegressionPerf\n",
    "\n",
    "from interpret import set_visualize_provider\n",
    "from interpret.provider import InlineProvider\n",
    "set_visualize_provider(InlineProvider())\n",
    "\n",
    "dataset = fetch_california_housing()\n",
    "X = dataset.data\n",
    "y = dataset.target\n",
    "names = dataset.feature_names\n",
    "\n",
    "seed = 42\n",
    "np.random.seed(seed)\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2>Explore the dataset</h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from interpret import show\n",
    "from interpret.data import Marginal\n",
    "\n",
    "marginal = Marginal(names).explain_data(X_train, y_train, name='Train Data')\n",
    "show(marginal)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2>Train the Explainable Boosting Machine (EBM)</h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from interpret.glassbox import ExplainableBoostingRegressor, LinearRegression, RegressionTree\n",
    "\n",
    "ebm = ExplainableBoostingRegressor(names, interactions=3)\n",
    "ebm.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2>EBMs are glassbox models, so we can edit them</h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# post-process monotonize the MedInc feature\n",
    "ebm.monotonize(\"MedInc\", increasing=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2>Global Explanations: What the model learned overall</h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ebm_global = ebm.explain_global(name='EBM')\n",
    "show(ebm_global)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2>Local Explanations: How an individual prediction was made</h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ebm_local = ebm.explain_local(X_test[:5], y_test[:5], name='EBM')\n",
    "show(ebm_local, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2>Evaluate EBM performance</h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ebm_perf = RegressionPerf(ebm, names).explain_perf(X_test, y_test, name='EBM')\n",
    "show(ebm_perf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2>Let's test out a few other Explainable Models</h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from interpret.glassbox import LinearRegression, RegressionTree\n",
    "\n",
    "lr = LinearRegression(names)\n",
    "lr.fit(X_train, y_train)\n",
    "\n",
    "rt = RegressionTree(names, random_state=seed)\n",
    "rt.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2>Compare performance using the Dashboard</h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lr_perf = RegressionPerf(lr, names).explain_perf(X_test, y_test, name='Linear Regression')\n",
    "show(lr_perf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rt_perf = RegressionPerf(rt, names).explain_perf(X_test, y_test, name='Regression Tree')\n",
    "show(rt_perf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2>Glassbox: All of our models have global and local explanations</h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lr_global = lr.explain_global(name='Linear Regression')\n",
    "show(lr_global)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rt_global = rt.explain_global(name='Regression Tree')\n",
    "show(rt_global)"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
